From: PHL Geo Inc. TO: Philadelphia City Council Subject: Introduction of a new development monitoring tool
Dear City Council Members,
We are PHL Geo Inc., a Philadelphia based civil-service oriented coding organization using the power of geospatial analysis and city-owned data to provide informed insights to decision makers such as yourself. We are reaching out to introduce an exciting new tool which can be used in your districts immediately. The tool is called the GENTRISK and uses 10 years of development information to predict where development may happen in your community in the near future. The GENTRISK may be used in your district to manage the negative impacts of development, including trash dumping, street closures, and noise complaints. Getting ahead of development impacts can improve coordination and communication with residents in your district, improve the likelihood of successful permit applications, and enable smart zoning and growth in your city council region.
Development is a process determined by numerous factors and can have positive effects; however, development-induced gentrification, which is “a process a neighborhood change that includes economic change in a historically disinvested neighborhood - by means of real estate investment and new higher-income residents moving in - as well as demographic change - not only in terms of income level, but also in terms of changes in the education level or racial make-up of residents,” can have negative effects (Demsas, 2021). As city council members looking to promote econoimc development while ensuring that residents have the protections that they need, the GENTRISK can provide tools that will help you see where development pressures are happening and engage in proactive outreach to those communities. Aligning city services such as increaseing access to Philadelhpia’s Homestead Exemption or to the PA Homeowner Assistance fund, can reduce these negative impacts.
The GENTRISK uses information from the past 10 years of development and combines that with external factors including the distance to schools, tree density, and distance to public transit, in order to understand what is driving development in Philadelphia. The GENTRISK then uses the relationships between those factors to predict where development may occur so you can get ahead of any potential negative impacts. These predictions form the basis of several tools available in GENTRISK, such as scenario planning. In the scenario planning framework, you may modify inputs to see how they will impact development going forward. For example, if the new Philly Tree Plan is likely to bring more tree canopy to your community, you can use the GENTRISK to model how much further development is likely to occur.
The following document provides a technical overview of how the GENTRISK functions, how to interpret its results, and how it can be incorporated into your city planning techniques. We look forward to seeing GENTRISK integrated into your city planning toolkit.
For questions, comments, concerns, or to learn more, please email phlgeo@gmail.com
Kindly, Akira DiSandro Benjamin Myers
PHL GEO Inc. Founders
The GENTRISK tool uses the construction of new housing and commercial buildings as its core output, as construction provides both opportunities (in the form of economic development and increased tax base) and pressures (in the form of gentrification and noise/trash) on residents and neighborhoods. Construction information is available from the Department of Licenses and Inspection for a time period stretching over the last 10 years. While neighborhoods are constantly in a state of change, focusing in on a more specific time period can provide City Council with an accurate understanding of what is driving development in the city. THe time period from 2013 to 2018 provides a good baseline of understanding development drivers as it is prior to the impacts that the COVID-19 global pandemic had on development drivers, yet recent enough that certain key drivers - such as the turnover of permanent residents for renters - are still prevalent within Philadelphia.
The model powering GENTRISK requires that all of the variables are gathered in the same format and then compared across time. To facilitatate this process, the PHL GEO Inc. team used the following code to gather all of the variables and put them into a similar format. These variables can then provide an understnaidng of how neighborhoods are changing over time, followed by an investigation of what could be driving those changes; ie, if development is happening in a neighborhood, what are the factors that drive that development?
A key way to contextualize these changes is to look at the changes that are happening within city council districts throughout Philadelphia. AS is shown through the map below, there are 10 city council districts in Philadelphia where changes and development pressures will have to be accounted for.
# philly permit data
dat_permit <- st_read("https://phl.carto.com/api/v2/sql?q=SELECT+*+FROM+permits&filename=permits&format=geojson&skipfields=cartodb_id")
# only keep new construction permits
newcon_permits <- dat_permit %>%
filter(grepl("NEW CON|NEWCON",typeofwork)) %>%
mutate(year = substr(permitissuedate, 1,4)) %>%
filter(year %in% c(2013:2019,2022)) %>%
st_transform(crs = 2272)
rm(dat_permit) # remove this big file because we won't use this anymore
city_council <- st_read("https://opendata.arcgis.com/api/v3/datasets/1ba5a5d68f4a4c75806e78b1d9245924_0/downloads/data?format=geojson&spatialRefId=4326&where=1%3D1")
city_council <- city_council %>%
mutate(DISTRICT = factor(DISTRICT, levels = 1:10))
# census data
acs_variable_list.2019 <- load_variables(2019, #year
"acs5", #five year ACS estimates
cache = TRUE) %>%
filter(geography == "block group")
# variables of interest:
# B19013_001: medHHincome
# B01001_001: total pop
# B02001_002: white pop
# B25058_001: median rent
# B15003_022: attainment of bachelor's of population 25+
# B25071_001: Median gross rent as a percentage of household income (past year)
# B07201_001: Geographical Mobility in the Past Year for Current Residence--Metropolitan Statistical Area Level in the United States, estimated total
# B07201_002: Geographical Mobility in the Past Year for Current Residence--Metropolitan Statistical Area Level in the United States, same house 1 year ago
# B25008_002: total population in occupied housing by tenure, owner occupied
# B25008_003: total population in occupied housing by tenure, renter occupied
# B99082_001: allocation of private vehicle occupancy
# B25044_001: tenure by vehicles available
# B09002_001: own children under 18
# B11004_001: related children under 18
census_vars <- paste0(c("B01001_001", "B19013_001", "B02001_002", "B25058_001", "B15003_022", "B25071_001", "B07201_001",
"B07201_002", "B25008_002", "B25008_003", "B99082_001", "B25044_001", "B09002_001",
"B11004_001"),"E") # census variables of interest that are available
medHHinc2019 <- 70459 # source: https://www.deptofnumbers.com/income/pennsylvania/philadelphia/
tracts19 <-
get_acs(geography = "block group",
variables = census_vars,
year = 2019, state = 42,
geometry = T, output = "wide") %>%
st_transform(crs = 2272) %>%
dplyr::select(!matches("M$")) %>%
rename(total_pop = B01001_001E,
med_hh_inc = B19013_001E,
white_pop = B02001_002E,
med_rent = B25058_001E,
bachelors25 = B15003_022E,
pct_rent_hhinc = B25071_001E,
mobility_tot_metro = B07201_001E,
samehouse1yr_metro = B07201_002E,
owner_occ = B25008_002E,
renter_occ = B25008_003E,
private_vehicle_occ = B99082_001E,
vehicles_avail = B25044_001E,
) %>%
mutate(children = B09002_001E + B11004_001E,
year = "2019") %>%
dplyr::select(!matches("^B|white_pop",ignore.case = F)) %>%
st_centroid()
# philly neighborhood data
nhoods_path <- 'https://raw.githubusercontent.com/azavea/geo-data/master/Neighborhoods_Philadelphia/Neighborhoods_Philadelphia.geojson'
nhoods <- st_read(nhoods_path, quiet = T) %>%
st_transform(crs = 2272) %>%
dplyr::select(mapname)
# philly bounds
philly <- st_read("https://opendata.arcgis.com/datasets/405ec3da942d4e20869d4e1449a2be48_0.geojson") %>%
st_transform(crs = 2272) %>%
dplyr::select(OBJECTID,geometry)
# limit permit dat to philly border
newcon_permits <- newcon_permits %>%
.[philly,]
# 311 incident reports data
carto_url = "https://phl.carto.com/api/v2/sql"
# Crime incidents
table_name_311 = "public_cases_fc"
# query
where_311 = "closed_datetime >= '2010-01-01' AND closed_datetime < '2020-01-01' AND service_name IN ('Rubbish/Recyclable Material Collection','Illegal Dumping','Construction Complaints', 'Building Construction', 'Sanitation Violation', 'Street Trees', 'Dangerous Sidewalk', 'Homeless Encampment Request', 'Parks and Rec Safety and Maintenance', 'Vacant House or Commercial')"
query311 = paste("SELECT *",
"FROM", table_name_311,
"WHERE", where_311)
reports311 = rphl::get_carto(query311, format = "csv", base_url = carto_url, stringsAsFactors = F)
reports311 <- reports311 %>%
mutate(Year = year(format.Date(reports311$closed_datetime)))
reports311 <- reports311 %>%
filter(!is.na(lon) & !is.na(lat)) %>%
st_as_sf(coords = c("lon","lat"), crs = 4326) %>%
st_transform(crs = 2272) %>%
mutate(
type = str_replace_all(service_name, "[^[:alnum:]]", ""),
Legend = "311") %>%
dplyr::select(Legend, type, geometry, Year)
# plot city council districts
ggplot() +
geom_sf(data=city_council, aes(fill=DISTRICT)) +
scale_fill_viridis(discrete = T) +
labs(fill = "City Council District",
title = "City Council Districts Within Philadelphia",
caption = "Figure 1.") +
mapTheme()
Development patterns may be spread over space, such that they forms a smooth ‘mosaic’ across the city of Philadelphia. Alternatively, it is possible that several permits may be clustered together, either because they were all submitted as part of the same construction project (ie, submitting an electrical permit and roofing permit for the same property) or there were several units being worked in tandem. To better understand whether Philadelphia is experiencing a ‘mosaic’ of development or if development has been clustered into particular areas, a fishnet was established for grid cells of 500 meter by 500 meters across the city. The 500 meter scale (1/3rd mile) allows for a fluid understanding of development patterns at the block scale.
Figure 1 below shows the number of permits granted for each fishnet square in 2014, followed by Figure 2 showing the number of permits for each fishnet square in 2015, and so on until Figure 6 demonstrating the number of permits granted for each fishnet square in 2019. As is visible through the maps, trends and patterns of development shift across the city, such that in 2014 development centered around the Center City Neighborhood, while moving towards 2019 - when record number of permits were granted for the time period in focus - development was centered in Center City as well as North Philadelphia.
It is worthwhile to note that while there are a number of areas with extensive permit counts and development pressures, so too are there a significant number of fishnet cells in the city that did not experience much - if any - development. These low-no development areas, from the perspective of new construction, could have an outside influence on the model with impacts on its accuracy in areas experiecing higher levels of development. This was factored into the model through the use of a Poisson distribution, described in greater detail below.
# our CRS is in feet, but want to make fishnet a 500m x 500m
cell_size <- 500 * 3.28084
fishnet <-
st_make_grid(philly,
cellsize = cell_size,
square = TRUE,
crs = 2272) %>%
.[philly] %>%
st_sf() %>%
mutate(uniqueID = 1:n())
# 2013
{
permit_net13 <-
dplyr::select(newcon_permits %>% filter(year == "2013")) %>%
mutate(count_permits = 1) %>%
aggregate(., fishnet, sum) %>%
mutate(count_permits13 = replace_na(count_permits, 0),
uniqueID = as.numeric(rownames(.)),
cvID = sample(round(nrow(fishnet) / 16), size=nrow(fishnet), replace = TRUE))
}
# 2014
{
permit_net14 <-
dplyr::select(newcon_permits %>% filter(year == "2014")) %>%
mutate(count_permits = 1) %>%
aggregate(., fishnet, sum) %>%
mutate(count_permits14 = replace_na(count_permits, 0),
uniqueID = as.numeric(rownames(.)),
cvID = sample(round(nrow(fishnet) / 16), size=nrow(fishnet), replace = TRUE)) %>%
dplyr::select(-count_permits)
}
# 2015
{
permit_net15 <-
dplyr::select(newcon_permits %>% filter(year == "2015")) %>%
mutate(count_permits = 1) %>%
aggregate(., fishnet, sum) %>%
mutate(count_permits15 = replace_na(count_permits, 0),
uniqueID = as.numeric(rownames(.)),
cvID = sample(round(nrow(fishnet) / 16), size=nrow(fishnet), replace = TRUE)) %>%
dplyr::select(-count_permits)
}
# 2016
{
permit_net16 <-
dplyr::select(newcon_permits %>% filter(year == "2016")) %>%
mutate(count_permits = 1) %>%
aggregate(., fishnet, sum) %>%
mutate(count_permits16 = replace_na(count_permits, 0),
uniqueID = as.numeric(rownames(.)),
cvID = sample(round(nrow(fishnet) / 16), size=nrow(fishnet), replace = TRUE)) %>%
dplyr::select(-count_permits)
}
# 2017
{
permit_net17 <-
dplyr::select(newcon_permits %>% filter(year == "2017")) %>%
mutate(count_permits = 1) %>%
aggregate(., fishnet, sum) %>%
mutate(count_permits17 = replace_na(count_permits, 0),
uniqueID = as.numeric(rownames(.)),
cvID = sample(round(nrow(fishnet) / 16), size=nrow(fishnet), replace = TRUE)) %>%
dplyr::select(-count_permits)
}
# 2018
{
permit_net18 <-
dplyr::select(newcon_permits %>% filter(year == "2018")) %>%
mutate(count_permits = 1) %>%
aggregate(., fishnet, sum) %>%
mutate(count_permits18 = replace_na(count_permits, 0),
uniqueID = as.numeric(rownames(.)),
cvID = sample(round(nrow(fishnet) / 16), size=nrow(fishnet), replace = TRUE)) %>%
dplyr::select(-count_permits)
}
# 2019
{
permit_net19 <-
dplyr::select(newcon_permits %>% filter(year == "2019")) %>%
mutate(count_permits = 1) %>%
aggregate(., fishnet, sum) %>%
mutate(count_permits19 = replace_na(count_permits, 0),
uniqueID = as.numeric(rownames(.)),
cvID = sample(round(nrow(fishnet) / 16), size=nrow(fishnet), replace = TRUE)) %>%
dplyr::select(-count_permits)
}
# combine permit counts from all years
permit_net_allyrs <- permit_net13 %>%
st_drop_geometry() %>%
dplyr::select(uniqueID,cvID,count_permits13) %>%
left_join(permit_net14 %>% st_drop_geometry() %>% dplyr::select(uniqueID,count_permits14), by = "uniqueID") %>%
left_join(permit_net15 %>% st_drop_geometry() %>% dplyr::select(uniqueID,count_permits15), by = "uniqueID") %>%
left_join(permit_net16 %>% st_drop_geometry() %>% dplyr::select(uniqueID,count_permits16), by = "uniqueID") %>%
left_join(permit_net17 %>% st_drop_geometry() %>% dplyr::select(uniqueID,count_permits17), by = "uniqueID") %>%
left_join(permit_net18 %>% st_drop_geometry() %>% dplyr::select(uniqueID,count_permits18), by = "uniqueID") %>%
left_join(permit_net19 %>% st_drop_geometry() %>% dplyr::select(uniqueID,count_permits19), by = "uniqueID") %>%
left_join(permit_net13 %>% dplyr::select(-c(cvID,count_permits13,count_permits)), by = "uniqueID")
# map of permit count over time
permit_net_formap <- permit_net_allyrs %>%
dplyr::select(-c(uniqueID,cvID,count_permits14,count_permits16,count_permits18)) %>%
rename(`Permit Count 2013` = count_permits13,
`Permit Count 2015` = count_permits15,
`Permit Count 2017` = count_permits17,
`Permit Count 2019` = count_permits19) %>%
gather("count_permits", "value", -geometry) %>%
st_as_sf()
permit_count_map <- ggplot(data = permit_net_formap) +
geom_sf(aes(fill = value)) +
scale_fill_viridis() +
facet_wrap(~count_permits) +
labs(title = "Count of New Construction Permits for the fishnet",
caption = "Figure 2.") +
mapTheme(title_size = 14) + theme(legend.position="bottom")
permit_count_map
There are a number of different methods to account for neighborhood change and the impacts of development, all of which are powered. The GENTRISK takes into account several census features, which are described below (along with their reasoning)
The model powering GENTRISK requires that all of the variables are gathered in the same format and then compared across time. To facilitatate this process, the PHL GEO Inc. team used the following code to gather all of the variables and put them into a similar format. These variables can then provide an understnaidng of how neighborhoods are changing over time, followed by an investigation of what could be driving those changes; ie, if development is happening in a neighborhood, what are the factors that drive that development?
# function to make fishnet from acs dataframe (dat_tract), for variable (var_name), giving it the legend (var_name), aggregating with function (sum_or_mean)
# for example, to make a fishnet summing all total_pop variables for 2019, one would define the variables as follows:
# dat_tract <- tracts19; var_name <- "total_pop"; sum_or_mean <- sum
makenet <- function(dat_tract, var_name, sum_or_mean){
net_tract <- dat_tract %>%
.[philly,] %>%
dplyr::select(matches(var_name)) %>%
mutate(Legend = var_name) %>%
st_join(fishnet, join=st_within) %>%
st_drop_geometry() %>%
group_by(uniqueID, Legend) %>%
summarize(count = sum_or_mean(get(var_name), na.rm = T)) %>%
left_join(fishnet, ., by = "uniqueID") %>% # add geometry back in
spread(Legend, count, fill=0) %>% # fill in ones where fishnet was missing, count was NA with 0
dplyr::select(-`<NA>`) %>%
ungroup()
return(net_tract)
}
var_names <- names(tracts19)[3:14]
# for loop to create fish nets for all ACS variables for all years
for (i in 1:length(var_names)){
# select year, variable, and legend name
var_name <- var_names[i]
if (i %in% c(2:4,18)){
sum_or_mean <- mean
} else {
sum_or_mean <- sum
}
# run function to make fishnet
net_tract <- makenet(tracts19,var_name,sum_or_mean)
# save as individual fishnet, uncomment if neededd
# net_tract_name <- paste0("net_",var_name,yr)
# assign(net_tract_name, net_tract)
# save all variables into one big net
if (i == 1) {
census_net <- net_tract %>%
st_drop_geometry()
} else {
net_tract_tojoin <- net_tract %>%
st_drop_geometry() %>%
dplyr::select(1:2)
census_net <- left_join(census_net, net_tract_tojoin, by = "uniqueID")
}
}
In addition to census information, the GENTRSIK takes into account other factors and amenities which could increase the likelihood of development and subsequent pressures. The incorporation of these factors builds on a geospatial risk model where the likelihood that that an increase in the following factors would correspond to an increase in the number of permits granted. A resident could use this to ask the question “given certain characteristics about my neighborhood, what is the likelihood, and the amount, of permit granting that I can expect to see?”
A primary challenge of using data in these models is that it is available across multiple scales and times; to counteract this challenge the GENTRISK model focuses on the distribution of the following factors from 2013 to 2018, and then uses this information to validate its results in 2019. Information was gathered from 311 calls sent to the city, particularly as they relate to new construction impacts. As 311 calls are openly accessible to all, they represent data that is much more flexible and inclusive than data solely collected by the city.
Similar to permits,
311 Variables to include are: - Dangerous Sidewalk complaints - Illegal dumping complaints - Parks and Recreation Safety and Maintenance complaints - Rubbish / Recyclable Material Collection complaints - Street Tree maintenance complaints - And the distance between those variables to eachother.
long_15 <- reports311 %>%
filter(Year == 2015) %>%
arrange(type) %>%
mutate(Legend = paste0(type, Year), .keep = "unused")
long_16 <- reports311 %>%
filter(Year == 2016) %>%
arrange(type) %>%
mutate(Legend = paste0(type, Year), .keep = "unused")
long_17 <- reports311 %>%
filter(Year == 2017) %>%
arrange(type) %>%
mutate(Legend = paste0(type, Year), .keep = "unused")
long_18 <- reports311 %>%
filter(Year == 2018) %>%
arrange(type) %>%
mutate(Legend = paste0(type, Year), .keep = "unused")
long_19 <- reports311 %>%
filter(Year == 2019) %>%
arrange(type) %>%
mutate(Legend = paste0(type, Year), .keep = "unused")
vars311_net <- rbind(long_15, long_16, long_17, long_18, long_19) %>%
st_join(fishnet, join = st_within) %>%
st_drop_geometry() %>%
group_by(uniqueID, Legend) %>%
summarize(count = n()) %>%
left_join(fishnet, ., by = "uniqueID") %>% # add geometry back in
spread(Legend, count, fill=0) %>% # fill in ones where fishnet was missing, count was NA with 0
dplyr::select(-`<NA>`) %>%
ungroup()
# use something like code below (from lab 6) to create nearest neighbor counts
# convenience to reduce length of function names.
st_c <- st_coordinates
st_coid <- st_centroid
vars311_net_ccoid <- st_c(st_coid(vars311_net))
# add nearest neighbor features
vars311_net <- vars311_net %>%
mutate(buildingCon15.dist = nn_function(vars311_net_ccoid,
st_c(long_15 %>% filter(Legend == "BuildingConstruction2015")), 3),
dangerSide15.dist = nn_function(vars311_net_ccoid,
st_c(long_15 %>% filter(Legend == "DangerousSidewalk2015")), 3),
illegalDump15.dist = nn_function(vars311_net_ccoid, st_c(long_15 %>% filter(Legend == "IllegalDumping2015")), 3),
parksNrec15.dist = nn_function(vars311_net_ccoid,
st_c(long_15 %>% filter(Legend == "ParksandRecSafetyandMaintenance2015")), 3),
materialColl15.dist = nn_function(
vars311_net_ccoid, st_c(long_15 %>% filter(Legend == "RubbishRecyclableMaterialCollection2015")), 3),
streetTrees15.dist = nn_function(vars311_net_ccoid, st_c(long_15 %>% filter(Legend == "StreetTrees2015")), 3),
vacant15.dist = nn_function(vars311_net_ccoid,
st_c(long_15 %>% filter(Legend == "VacantHouseorCommercial2015")), 3),
buildingCon16.dist = nn_function(vars311_net_ccoid,
st_c(long_16 %>% filter(Legend == "BuildingConstruction2016")), 3),
dangerSide16.dist = nn_function(vars311_net_ccoid,
st_c(long_16 %>% filter(Legend == "DangerousSidewalk2016")), 3),
illegalDump16.dist = nn_function(vars311_net_ccoid, st_c(long_16 %>% filter(Legend == "IllegalDumping2016")), 3),
parksNrec16.dist = nn_function(vars311_net_ccoid,
st_c(long_16 %>% filter(Legend == "ParksandRecSafetyandMaintenance2016")), 3),
materialColl16.dist = nn_function(
vars311_net_ccoid, st_c(long_16 %>% filter(Legend == "RubbishRecyclableMaterialCollection2016")), 3),
streetTrees16.dist = nn_function(vars311_net_ccoid, st_c(long_16 %>% filter(Legend == "StreetTrees2016")), 3),
vacant16.dist = nn_function(vars311_net_ccoid,
st_c(long_16 %>% filter(Legend == "VacantHouseorCommercial2016")), 3),
buildingCon17.dist = nn_function(vars311_net_ccoid,
st_c(long_17 %>% filter(Legend == "BuildingConstruction2017")), 3),
dangerSide17.dist = nn_function(vars311_net_ccoid,
st_c(long_17 %>% filter(Legend == "DangerousSidewalk2017")), 3),
illegalDump17.dist = nn_function(vars311_net_ccoid, st_c(long_17 %>% filter(Legend == "IllegalDumping2017")), 3),
parksNrec17.dist = nn_function(vars311_net_ccoid,
st_c(long_17 %>% filter(Legend == "ParksandRecSafetyandMaintenance2017")), 3),
materialColl17.dist = nn_function(
vars311_net_ccoid, st_c(long_17 %>% filter(Legend == "RubbishRecyclableMaterialCollection2017")), 3),
streetTrees17.dist = nn_function(vars311_net_ccoid, st_c(long_17 %>% filter(Legend == "StreetTrees2017")), 3),
vacant17.dist = nn_function(vars311_net_ccoid,
st_c(long_17 %>% filter(Legend == "VacantHouseorCommercial2017")), 3),
buildingCon18.dist = nn_function(vars311_net_ccoid,
st_c(long_18 %>% filter(Legend == "BuildingConstruction2018")), 3),
dangerSide18.dist = nn_function(vars311_net_ccoid,
st_c(long_18 %>% filter(Legend == "DangerousSidewalk2018")), 3),
illegalDump18.dist = nn_function(vars311_net_ccoid, st_c(long_18 %>% filter(Legend == "IllegalDumping2018")), 3),
parksNrec18.dist = nn_function(vars311_net_ccoid,
st_c(long_18 %>% filter(Legend == "ParksandRecSafetyandMaintenance2018")), 3),
materialColl18.dist = nn_function(
vars311_net_ccoid, st_c(long_18 %>% filter(Legend == "RubbishRecyclableMaterialCollection2018")), 3),
streetTrees18.dist = nn_function(vars311_net_ccoid, st_c(long_18 %>% filter(Legend == "StreetTrees2018")), 3),
vacant18.dist = nn_function(vars311_net_ccoid,
st_c(long_18 %>% filter(Legend == "VacantHouseorCommercial2018")), 3),
buildingCon19.dist = nn_function(vars311_net_ccoid,
st_c(long_19 %>% filter(Legend == "BuildingConstruction2019")), 3),
dangerSide19.dist = nn_function(vars311_net_ccoid,
st_c(long_19 %>% filter(Legend == "DangerousSidewalk2019")), 3),
illegalDump19.dist = nn_function(vars311_net_ccoid, st_c(long_19 %>% filter(Legend == "IllegalDumping2019")), 3),
parksNrec19.dist = nn_function(vars311_net_ccoid,
st_c(long_19 %>% filter(Legend == "ParksandRecSafetyandMaintenance2019")), 3),
materialColl19.dist = nn_function(
vars311_net_ccoid, st_c(long_19 %>% filter(Legend == "RubbishRecyclableMaterialCollection2019")), 3),
streetTrees19.dist = nn_function(vars311_net_ccoid, st_c(long_19 %>% filter(Legend == "StreetTrees2019")), 3),
vacant19.dist = nn_function(vars311_net_ccoid,
st_c(long_19 %>% filter(Legend == "VacantHouseorCommercial2019")), 3))
# join census vars fishnet to permit count net + 311 net
all_net <- left_join(permit_net_allyrs, census_net, by = "uniqueID") %>%
left_join(vars311_net %>% st_drop_geometry(), by = "uniqueID") %>%
st_as_sf()
# make a subset of the net with variables we're actually interested in putting in the model
# add neighborhood names to data
# subset_net <- all_net %>%
# st_centroid() %>%
# st_join(dplyr::select(nhoods, mapname)) %>%
# st_drop_geometry() %>%
# left_join(dplyr::select(all_net, geometry, uniqueID), by = "uniqueID") %>%
# st_sf() %>%
# na.omit()
ggplot()+
geom_sf(data=all_net,aes(fill=all_net$total_pop))+
labs(title = "Total Population in from 2013-2018", fill = "Total Population in the fishnet, from 2013 - 2018")
# ggplot()+
# geom_sf(data=all_net,aes(fill=all_net$`Total Population 15`))+
# labs(title = "Total Population in 2015", fill = "Total Population in 2015")
# ggplot()+
# geom_sf(data=all_net,aes(fill=all_net$`Total Population 16`))+
# labs(title = "Total Population in 2016", fill = "Total Population in 2016")
# ggplot()+
# geom_sf(data=all_net,aes(fill=all_net$`Total Population 17`))+
# labs(title = "Total Population in 2017", fill = "Total Population in 2017")
# ggplot()+
# geom_sf(data=all_net,aes(fill=all_net$`Total Population 18`))+
# labs(title = "Total Population in 2018", fill = "Total Population in 2018")
# ggplot()+
# geom_sf(data=all_net,aes(fill=all_net$`Total Population 19`))+
# labs(title = "Total Population in 2019", fill = "Total Population in 2019")
<<<<<<< HEAD #### Figure 3. Correlation Matrix of all Predictors =======
All census variables were pulled at the block group level, which is a smaller scale than the 500x500 meter fishnet used in this analysis. As there is no perfect way to join block-group-level census data to our fishnet, we joined the most center point of each block group to the fishnet, and aggregated data where necessary. As an example, if a fishnet encompassed census blocks A, B, and C, with respective populations of 100,150, and 125, then the total fishnet population would be 375. This enables a smooth comparison across the surface of Philadelphia. In some cases where the centroid of the block group falls outside the boundary of the fishnet square, the resulting fishnet square population would be 0. While this introduces a certain level of error into the calculations, the error can be calculated and corrective actions and more informed decisions can be made. ## Figure 7. Correlation Matrix of all possible risk factors
With all of the potential variables to incorporate, it is important to evaluate which factors have the greatest potential influence on the prediction of the permit count (the proxy for development), which can be done through the chart below, also called a Correlation Matrix. In the matrix, boxes with a deep red color indicate a positive relationship, such that as the value increases, the predicted permit count increases. Boxes with a deep blue color indicate that as the variable increases in value, the predicted permit count decreases. We established our model to use information from 2013 - 2018, and test our predictions on 2019, the last pre-covid impacted year. Using these relationships we can establish clear linkages between 2019 predictions and future predictions.
From the matrix below, we can see that previous year’s permit count (ie, the development that occured in 2018) emerged as the single most important factor the count of future permits (in this case, modelling the development that would occur in 2019).
The strength of the relationships is 0.9, such that a 1% increase in the number of permits granted for new construction in the year prior would net an 0.9% increase in the count of permits for the year in which we are predicting. Earlier years also have a relationship, such that a 1% increase in the number of permits granted three years prior (in 2016) would explain about a 0.6% increase in the number of permits in the target year.
Non-temporal factors that have a high influence include 311 Building Construction Complaints (1% increase means a 0.6% increase in permit count), Illegal Dumping complaints (1% increase means a 0.5% increase in permit count), and Rubbish/Recyclable Material Collection complaints (1% increase explains a 0.4% increase in permit count). >>>>>>> 5097e5bb2c0f527552d6019ea3c0d7f4ba308bd0
# making this moreso to see which would actually be good predictors
var_for_model <- c("uniqueID","cvID","count_permits19","count_permits18","count_permits17","count_permits16",
"count_permits15","BuildingConstruction2019","IllegalDumping2019","DangerousSidewalk2019",
"RubbishRecyclableMaterialCollection2019","StreetTrees2019","parksNrec19.dist",
"VacantHouseorCommercial2019","vehicles_avail","med_rent","renter_occ","total_pop",
"samehouse1yr_metro","bachelors25","children")
all_net19 <- all_net %>%
dplyr::select(all_of(var_for_model))
# only keep years for count_permit columns
names(all_net19) <- c(str_replace_all(var_for_model, c("2019" = "","19." = ".")), "geometry")
for_cormat19 <- all_net19 %>%
st_drop_geometry() %>%
dplyr::select(-c(uniqueID,cvID))
ggcorrplot(
round(cor(for_cormat19), 1),
p.mat = cor_pmat(for_cormat19),
colors = c("#4b2875", "white", "#9c1339"),
type="lower",
insig = "blank",
digits = 4,
lab = T, lab_size = 2)+
labs(caption = "Figure 3")
We also added some features that explain the spatial process of permit issuance in Philadelphia. Using local Moran’s I, we examined the spatial process which helps us understand whether the permit count at a certain location is randomly distributed or clustered relative to its immediate neighbors.
Recall that our goal is to assess the amount of development that will occur in the future, using as a proxy permits granted for new construction. To assess future oncsutrction, the model used gathers the raw permit counts as well as the variables described in the correlation matrix from 2013 to 2018, and then will predict on the permit count in 2019. However, as is shown in figure 4 below, as the majority of fishnet squares do not see permit applications in a given year. In order to evaluate the true relationships between the variables it is important to factor out the influence of the low-no values through the use of a Poisson regression. While low-no values may also be due to errors in changing the spatial scale of the census blocks to the fishnet squares, the majority of these are due to no permit applications. A Poisson regression will minimize the influence of the low-no counts.
all_net19 %>% ggplot(aes(x = count_permits19)) +
geom_histogram(bins = 66, fill = viridis::cividis(1)) +
theme(text = element_text( color = "black"),
plot.title = element_text(size = 14, colour = "black"),
plot.subtitle = element_text(face="italic"),
plot.caption = element_text(hjust=0),
axis.ticks = element_blank(),
panel.background = element_blank(),
panel.grid.major = element_line("grey80", size = 0.1),
panel.grid.minor = element_blank(),
panel.border = element_rect(colour = "black", fill=NA, size=2),
strip.background = element_rect(fill = "grey80", color = "white"),
strip.text = element_text(size=12),
axis.title = element_text(size=12),
axis.text = element_text(size=10),
plot.background = element_blank(),
legend.background = element_blank(),
legend.title = element_text(colour = "black", face = "italic"),
legend.text = element_text(colour = "black", face = "italic"),
strip.text.x = element_text(size = 14)) +
labs(title = "Permit Distribution",
x = "Count of New Construction Permits", y = "Count",
caption = "Figure 4.")
In addition to controlling for the influence of low-no permit count fishnets, it is important to accoutn for any spatial variability in the model, so that the clusters of permit counts, as described earlier, do not spatially correlate with other variables. If there is a spatial auto-correlation between the two, this could provide an out sized influence for certain variables over others and influence the results. Figure 5 demonstrates certain spatial trends, such as the fact that building construction and rubbish 311 complaints are the largest in Southeast Philadelphia (City Council District 2), while the Center City area has the highest concentration of renter-occupied housing as well as individuals holding a bachelors degree.
# add neighborhood names
allnet19_formodel <- all_net19 %>%
st_centroid() %>%
st_join(dplyr::select(nhoods, mapname)) %>%
st_drop_geometry() %>%
left_join(dplyr::select(all_net19, geometry, uniqueID), by = "uniqueID") %>%
st_sf()
vars_net.long <- gather(allnet19_formodel %>% dplyr::select(-count_permits19),
variable, value, -geometry, -uniqueID, -cvID, -mapname)
vars <- unique(vars_net.long$variable)
varList <- list()
for (i in vars) {
varList[[i]] <- ggplot() +
geom_sf(data = filter(vars_net.long, variable == i),aes(fill = value), colour=NA) +
scale_fill_viridis(name = "") +
labs(title = i) +
theme(text = element_text( color = "black"),
plot.title = element_text(size = 14,colour = "black"),
plot.subtitle=element_text(face="italic"),
plot.caption=element_text(hjust=0),
axis.ticks = element_blank(),
panel.background = element_blank(),axis.title = element_blank(),
axis.text = element_blank(),
axis.title.x = element_blank(),
axis.title.y = element_blank(),
panel.grid.minor = element_blank(),
panel.border = element_rect(colour = "black", fill=NA, size=2),
strip.text.x = element_text(size = 7),
legend.position = "bottom",
legend.text = element_text(size = 5))
}
do.call(grid.arrange,c(varList, ncol = 3, top = "Predictors of Permit Count (on fishnet)", bottom = "Figure 5."))
[description of figure 6]
To assess where relationships between the input variables described above might truly be driving development (using the relationships established in figure 3), a Local Moran’s I test was done to establish if the new construction permits were randomly distributed through space, or if they were clustered together in a non-random fashion. A P-test was established as well in order to focus on statistically significant hot spots, which emerged in fishnet cells slightly north, in, and south of the Center City area.
final_net.nb <- poly2nb(as_Spatial(allnet19_formodel), queen=TRUE)
final_net.weights <- nb2listw(final_net.nb, style="W", zero.policy=TRUE) # turn neighborhood weights into list of weights
local_morans <- localmoran(allnet19_formodel$count_permits19, final_net.weights, zero.policy=TRUE) %>%
as.data.frame() # Ii moran's I at ith cell, Ei expected/mean from neighbors
# join local Moran's I results to fishnet
final_net.localMorans <-
cbind(local_morans, as.data.frame(allnet19_formodel)) %>%
st_sf() %>%
dplyr::select(`Permit Count` = count_permits19,
`Local Morans I` = Ii,
`P Value` = `Pr(z != E(Ii))`) %>%
mutate(`Significant Hotspots` = ifelse(`P Value` <= 0.001, 1, 0)) %>%
gather(variable, value, -geometry)
# now plot
vars <- unique(final_net.localMorans$variable)
varList <- list()
for(i in vars){
varList[[i]] <-
ggplot() +
geom_sf(data = filter(final_net.localMorans, variable == i),aes(fill = value), colour=NA) +
scale_fill_viridis(name="") +
labs(title=i) +
mapTheme(title_size = 14) + theme(legend.position="bottom")
}
do.call(grid.arrange,c(varList, ncol = 2, top = "Local Moran's I Statistics for Permit Count in Philadelphia",
bottom = "Figure 6."))
final_net <-
allnet19_formodel %>%
mutate(permitct.isSig =
ifelse(localmoran(allnet19_formodel$count_permits19,
final_net.weights)[,5] <= 0.001, 1, 0)) %>%
mutate(permitct.isSig.dist =
nn_function(st_coordinates(st_centroid(allnet19_formodel)),
st_coordinates(st_centroid(
filter(allnet19_formodel, permitct.isSig == 1))), 1))
Revisiting the variables described in Figure 3, the following scatterplots in Figure 7 demonstrate that there are significant relationships between certain factors including the previous year’s count (count18)
correlation.long <-
st_drop_geometry(final_net) %>%
dplyr::select(-uniqueID, -cvID, -mapname) %>%
gather(variable, value, -count_permits19)
correlation.cor <-
correlation.long %>%
group_by(variable) %>%
summarize(correlation = cor(value, count_permits19, use = "complete.obs"))
ordered_vars <- c("count_permits18","count_permits17","count_permits16","count_permits15","total_pop","bachelors25",
"children","med_rent","renter_occ","samehouse1yr_metro","vehicles_avail","BuildingConstruction",
"IllegalDumping","DangerousSidewalk","StreetTrees","RubbishRecyclableMaterialCollection",
"VacantHouseorCommercial","parksNrec.dist","permitct.isSig","permitct.isSig.dist")
ggplot(correlation.long %>%
mutate(variable = factor(variable, levels = ordered_vars)),
aes(value, count_permits19)) +
geom_point(size = 0.1) +
geom_text(data = correlation.cor %>%
mutate(variable = factor(variable, levels = ordered_vars)),
aes(label = paste("r =", round(correlation, 2))),
x=-Inf, y=Inf, vjust = 1.5, hjust = -.1) +
geom_smooth(method = "lm", se = FALSE, colour = "black") +
facet_wrap(~variable, ncol = 3, scales = "free") +
labs(title = "Permit count as a function of predictors",
caption = "Figure 7.") +
theme(text = element_text( color = "black"),
plot.title = element_text(size = 14, colour = "black"),
plot.subtitle = element_text(face="italic"),
plot.caption = element_text(hjust=0),
axis.ticks = element_blank(),
panel.background = element_blank(),
panel.grid.major = element_line("grey80", size = 0.1),
panel.grid.minor = element_blank(),
panel.border = element_rect(colour = "black", fill=NA, size=2),
strip.background = element_rect(fill = "grey80", color = "white"),
strip.text = element_text(size=12),
axis.title = element_text(size=12),
axis.text = element_text(size=10),
plot.background = element_blank(),
legend.background = element_blank(),
legend.title = element_text(colour = "black", face = "italic"),
legend.text = element_text(colour = "black", face = "italic"),
strip.text.x = element_text(size = 10))
# just risk factors
reg.vars <- c("count_permits18","count_permits17","count_permits16","count_permits15","BuildingConstruction",
"IllegalDumping","DangerousSidewalk","RubbishRecyclableMaterialCollection","vehicles_avail",
"med_rent","renter_occ","VacantHouseorCommercial","total_pop","samehouse1yr_metro","bachelors25",
"StreetTrees","children","parksNrec.dist")
# random k-fold CV
reg.CV <- crossValidate(dataset = final_net,
id = "cvID",
dependentVariable = "count_permits19",
indVariables = reg.vars) %>%
dplyr::select(cvID, count_permits19, Prediction, geometry) %>%
mutate(error = count_permits19 - Prediction)
# MAE
reg.MAE <- mean(abs(reg.CV$error)) # ~7.573239
# with local Moran's I spatial process features
reg.sp.vars <- c(reg.vars,"permitct.isSig","permitct.isSig.dist")
## RUN REGRESSIONS
reg.spatialCV <- crossValidate(
dataset = final_net,
id = "cvID",
dependentVariable = "count_permits19",
indVariables = reg.sp.vars) %>%
dplyr::select(cvID, count_permits19, Prediction, geometry) %>%
mutate(error = count_permits19 - Prediction)
# MAE
reg.spatial.MAE <- mean(abs(reg.spatialCV$error)) # ~5.595413
# LOGO-CV
# needed to redefine crossValidate to solve the issue with a neighborhood of NA
# this function also works for the previous lines using crossValidate()
crossValidate <- function(dataset, id, dependentVariable, indVariables) {
allPredictions <- data.frame()
# cvID_list <- unique(dataset[[id]])
cvID_list <- as.character(na.omit(unique(dataset[[id]])))
for (i in cvID_list) {
thisFold <- i
cat("This hold out fold is", thisFold, "\n")
fold.train <- filter(dataset, dataset[[id]] != thisFold) %>% as.data.frame() %>%
dplyr::select(id, geometry, all_of(indVariables),
all_of(dependentVariable))
fold.test <- filter(dataset, dataset[[id]] == thisFold) %>% as.data.frame() %>%
dplyr::select(id, geometry, all_of(indVariables),
all_of(dependentVariable))
form_parts <- paste0(dependentVariable, " ~ ", paste0(indVariables, collapse = "+"))
form <- as.formula(form_parts)
regression <- glm(form, family = "poisson",
data = fold.train %>%
dplyr::select(-geometry, -id))
thisPrediction <-
mutate(fold.test, Prediction = predict(regression, fold.test, type = "response"))
allPredictions <-
rbind(allPredictions, thisPrediction)
}
return(st_sf(allPredictions))
}
# adding neighborhood for LOGO CV, risk factors only
reg.logoCV <- crossValidate(
dataset = final_net,
id = "mapname",
dependentVariable = "count_permits19",
indVariables = reg.vars) %>%
dplyr::select(cvID = mapname, count_permits19, Prediction, geometry) %>%
mutate(error = count_permits19 - Prediction)
# MAE
reg.logo.MAE <- mean(abs(reg.logoCV$error)) # 8.273702
# risk factors + spatial process
reg.logo.spatialCV <- crossValidate(
dataset = final_net,
id = "mapname",
dependentVariable = "count_permits19",
indVariables = reg.sp.vars) %>%
dplyr::select(cvID = mapname, count_permits19, Prediction, geometry) %>%
mutate(error = count_permits19 - Prediction)
# MAE
reg.logo.spatial.MAE <- mean(abs(reg.logo.spatialCV$error)) # 6.170643
[description of figure 8 & 9]
reg.summary <- rbind(
mutate(reg.CV,
Error = Prediction - count_permits19,
Regression = "Random k-fold CV: Predictors"),
mutate(reg.spatialCV,
Error = Prediction - count_permits19,
Regression = "Random k-fold CV: Predictors with Spatial Process"),
mutate(reg.logoCV,
Error = Prediction - count_permits19,
Regression = "Spatial LOGO-CV: Predictors"),
mutate(reg.logo.spatialCV,
Error = Prediction - count_permits19,
Regression = "Spatial LOGO-CV: Predictors with Spatial Process")) %>%
st_sf()
error_by_reg_and_fold <-
reg.summary %>%
group_by(Regression, cvID) %>%
summarize(Mean_Error = mean(Prediction - count_permits19, na.rm = T),
MAE = mean(abs(Mean_Error), na.rm = T),
SD_MAE = mean(abs(Mean_Error), na.rm = T)) %>%
ungroup()
# make map
# adjust mapTheme in order to fit title
mapTheme <- function(base_size = 12, title_size = 16) {
theme(
text = element_text( color = "black"),
plot.title = element_text(size = title_size,colour = "black"),
plot.subtitle=element_text(face="italic"),
plot.caption=element_text(hjust=0),
axis.ticks = element_blank(),
panel.background = element_blank(),axis.title = element_blank(),
axis.text = element_blank(),
axis.title.x = element_blank(),
axis.title.y = element_blank(),
panel.grid.minor = element_blank(),
panel.border = element_rect(colour = "black", fill=NA, size=2),
strip.text.x = element_text(size = 7))
}
error_by_reg_and_fold %>%
ggplot() +
geom_sf(aes(fill = MAE)) +
facet_wrap(~Regression) +
scale_fill_viridis() +
labs(title = "Errors by Cross Validation method",
caption = "Figure 8.") +
mapTheme(title_size = 14) + theme(legend.position="bottom")
Here, we show the real vs predicted counts of new construction permits in 2019 (see figure 9).
reg.summary %>%
filter(Regression == "Random k-fold CV: Predictors with Spatial Process") %>%
dplyr::select(-c(cvID,error,Error)) %>%
rename(`Real Permit Count` = count_permits19) %>%
gather("counts","value",-geometry,-Regression) %>%
ggplot() +
geom_sf(aes(fill = value)) +
scale_fill_viridis() +
facet_wrap(~counts) +
labs(title = "Real vs Predicted Counts of Permits (2019)",
caption = "Figure 9.") +
theme(
text = element_text( color = "black"),
plot.title = element_text(size = 14,colour = "black"),
plot.subtitle=element_text(face="italic"),
plot.caption=element_text(hjust=0),
axis.ticks = element_blank(),
panel.background = element_blank(),axis.title = element_blank(),
axis.text = element_blank(),
axis.title.x = element_blank(),
axis.title.y = element_blank(),
panel.grid.minor = element_blank(),
panel.border = element_rect(colour = "black", fill=NA, size=2),
strip.text.x = element_text(size = 10),
legend.position = "bottom")
# adjust plotTheme to fit titles
plotTheme <- function(base_size = 12, title_size = 16) {
theme(
text = element_text( color = "black"),
plot.title = element_text(size = title_size, colour = "black"),
plot.subtitle = element_text(face="italic"),
plot.caption = element_text(hjust=0),
axis.ticks = element_blank(),
panel.background = element_blank(),
panel.grid.major = element_line("grey80", size = 0.1),
panel.grid.minor = element_blank(),
panel.border = element_rect(colour = "black", fill=NA, size=2),
strip.background = element_rect(fill = "grey80", color = "white"),
strip.text = element_text(size=12),
axis.title = element_text(size=12),
axis.text = element_text(size=10),
plot.background = element_blank(),
legend.background = element_blank(),
legend.title = element_text(colour = "black", face = "italic"),
legend.text = element_text(colour = "black", face = "italic"),
strip.text.x = element_text(size = 10)
)
}
error_by_reg_and_fold %>%
ggplot(aes(MAE)) +
geom_histogram(bins = 30, colour="black", fill = "#FDE725FF") +
facet_wrap(~Regression) +
geom_vline(xintercept = 0) + scale_x_continuous(breaks = seq(0, 450, by = 50)) +
labs(title="Distribution of MAE", subtitle = "k-fold cross validation vs. LOGO-CV",
x="Mean Absolute Error", y="Count",
caption = "Figure 10.") +
plotTheme(title_size = 14) + theme(legend.position="bottom")
[description of table 1] Table 1 below shows the summary of regressions, including the mean and standard deviation of MAE for both of the models and both of the cross validation methods. We highlighted the row showing the results of both cross validation methods for the model including spatial process, to make clear how much these spatial process features improved our model.
st_drop_geometry(error_by_reg_and_fold) %>%
group_by(Regression) %>%
summarize(`Mean MAE` = round(mean(MAE), 2),
`SD MAE` = round(sd(MAE), 2)) %>%
kable(caption = "Table 1: Summary of Regressions") %>%
kable_styling("striped", full_width = F) %>%
row_spec(c(2,4), bold = T, color = "white", background = viridis::cividis(1)) %>%
kable_classic(full_width = F, html_font = "Cambria")
| Regression | Mean MAE | SD MAE |
|---|---|---|
| Random k-fold CV: Predictors | 4.49 | 12.85 |
| Random k-fold CV: Predictors with Spatial Process | 2.76 | 4.07 |
| Spatial LOGO-CV: Predictors | 10.08 | 31.36 |
| Spatial LOGO-CV: Predictors with Spatial Process | 6.74 | 14.61 |
In order to assess if the model generalizes to different neighborhood contexts, we tested whether the model generalizes to a racial context. Using 2019 US Census data, I defined a neighborhood to be “Majority White” if over 50% of the population was White, and “Majority non-White” otherwise.
[description of table 2]
RaceContext <- get_acs(geography = "block group",
variables = c("B01001_001E","B02001_002E"),
year = 2019, state = 42,
geometry = T, output = "wide") %>%
st_transform(crs = 2272) %>%
dplyr::select(!matches("M$")) %>%
rename(total_pop = B01001_001E,
white_pop = B02001_002E) %>%
mutate(pct_white = ifelse(total_pop > 0, white_pop / total_pop,0),
RaceContext = ifelse(pct_white > 0.5, "Majority White",
ifelse(total_pop != 0 , "Majority non-White", NA))) %>%
dplyr::select(-c(NAME,total_pop,white_pop,pct_white)) %>%
.[nhoods,]
reg.summary %>%
filter(grepl("LOGO", Regression)) %>%
st_centroid() %>%
st_join(RaceContext) %>%
na.omit() %>%
st_drop_geometry() %>%
group_by(Regression, RaceContext) %>%
summarize(mean.Error = mean(Error, na.rm = T)) %>%
spread(RaceContext, mean.Error) %>%
kable(caption = "Table 2. Mean Error by Neighborhood Racial Context") %>%
kable_styling("striped", full_width = F) %>%
row_spec(2, bold = T, color = "white", background = viridis::cividis(1)) %>%
kable_classic(html_font = "Cambria")
| Regression | Majority non-White | Majority White |
|---|---|---|
| Spatial LOGO-CV: Predictors | 2.9338193 | 0.4763929 |
| Spatial LOGO-CV: Predictors with Spatial Process | 0.4727099 | 0.1580031 |
Similarly, we tested the generalizability of our model in an income context.
IncomeContext <- get_acs(geography = "block group",
variables = c("B19013_001E"),
year = 2019, state = 42,
geometry = T, output = "wide") %>%
st_transform(crs = 2272) %>%
dplyr::select(!matches("M$")) %>%
rename(med_hh_inc = B19013_001E) %>%
mutate(IncomeContext = ifelse(med_hh_inc > medHHinc2019, "Above Median Income", "Below Median Income")) %>%
dplyr::select(-c(NAME,med_hh_inc)) %>%
.[nhoods,]
reg.summary %>%
filter(grepl("LOGO", Regression)) %>%
st_centroid() %>%
st_join(IncomeContext) %>%
na.omit() %>%
st_drop_geometry() %>%
group_by(Regression, IncomeContext) %>%
summarize(mean.Error = mean(Error, na.rm = T)) %>%
spread(IncomeContext, mean.Error) %>%
kable(caption = "Table 3. Mean Error by Neighborhood Income Context") %>%
kable_styling("striped", full_width = F) %>%
row_spec(2, bold = T, color = "white", background = viridis::cividis(1)) %>%
kable_classic(html_font = "Cambria")
| Regression | Above Median Income | Below Median Income |
|---|---|---|
| Spatial LOGO-CV: Predictors | 0.2946100 | 2.9462298 |
| Spatial LOGO-CV: Predictors with Spatial Process | 0.3275409 | 0.7222918 |
In conclusion….